Identifying "aboutness topics": two annotation experiments
نویسندگان
چکیده
This paper deals with the annotation of “aboutness topic” (also known as “sentence topic”) in naturally occurring data. We report on two annotation experiments involving German newspaper texts: In experiment 1, based on the annotation guidelines by Götze et al. (2007), two expert annotators had to select the aboutness topic from among a small number of pre-defined choices, for a total of 588 sentence tokens. Although the results are disappointing in terms of inter-annotator agreement (with Fleiss’ κ between .19 and .57, depending on sentence category), they allowed us to identify and analyze cases that resist easy classification. Next, the original guidelines were tentatively modified, capturing insights from experiment 1. In a second experiment, based on the revised guidelines and following several training sessions, four raters annotated a total of 56 sentences, deciding for each NP whether or not it is an aboutness topic. Overall, the results are still disappointing from an inter-rater agreement perspective, but again, a closer look at the problematic cases is instructive as to the weaknesses of the current operationalizations of information structural notions and, perhaps most importantly, it reveals a number of basic theoretical questions that still await satisfactory answers.
منابع مشابه
Topics as Speech Acts: An Analysis of Conditionals
In this paper we show that syntactic similarities and differences between normal indicative conditionals (henceforth: ICs) and biscuit conditionals (henceforth: BCs) with fronted antecedents correspond to syntactic similarities and differences between two kinds of topic marking constructions: aboutness topic marking via German Left Dislocation (henceforth: GLD), on the one hand, and the marking...
متن کاملSIMPLEX NPS Clustered By Head: A Method For Identifying Significant Topics Within A Document
This paper discusses 'head clustering', a novel, linguistically-motivated method for representing the aboutness of a document. First, a list of candidate significant topics consisting all simplex NPs is extracted from the document. Next, these NPs are clustered by head. Finally, a significance measure is obtained by ranking frequency of heads: those NPs with heads that occur with greater freque...
متن کاملIdentification of Topic and Focus in Czech: Evaluation of Manual Parallel Annotations
is paper presents results of a control annotation of the Topic-Focus Articulation of Czech sentences based on the notion of “aboutness”. is is one of the steps testing the hypothesis about the relation between contextual boundness and “aboutness”. We suppose that the bipartition of the sentence into its Topic and Focus (“aboutness”) can be automatically derived from the values of contextual b...
متن کاملMore Reflections on "Aboutness" TREC-2001 Evaluation Experiments at Justsystem
The TREC-2001 Web track evaluation experiments at the Justsystem site are described with a focus on the “aboutness” based approach in text retrieval. In the web ad hoc task, our TREC-9 approach is adopted again, combining both pseudo-relevance feedback and reference database feedback but the setting is calibrated for an early precision preferred search. For the entry page finding task, we combi...
متن کاملDifferent Alternatives for Topics and Foci: Evidence from Indefinites and Multiple wh
In gapping, topical indefinites as well as wh-phrases can contrast with surface-identical antecedents if the contrast involved is the first of the two (or more) contrast pairs in the gapping coordination. This is not possible for most other types of expressions. We argue that both topical indefinites and wh-phrases introduce a discourse referent with a fixed address, on the basis of which refer...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- D&D
دوره 4 شماره
صفحات -
تاریخ انتشار 2013